Method and apparatus for visual lossless image syntactic encoding

ABSTRACT

A visual perception threshold unit for image processing identifies a plurality of visual perception threshold levels to be associated with the pixels of a video frame, wherein the threshold levels define contrast levels above which a human eye can distinguish a pixel from among its neighboring pixels of the video frame. The present invention also includes a method of generating visual perception thresholds by analysis of the details of the video frames, estimating the parameters of the details, and defining a visual perception threshold for each detail in accordance with the estimated detail parameters. The present invention further includes a method of describing images by determining which details in the image can be distinguished by the human eye and which ones can only be detected by it.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. Ser. No.10/121,685, filed Apr. 15, 2002, now U.S. Pat. No. 6,952,500, which is acontinuation application of U.S. Ser. No. 09/524,618, filed Mar. 14,2000, issued as U.S. Pat. No. 6,473,532, which patents are incorporatedherein by reference.

FIELD OF THE INVENTION

The present invention relates generally to processing of video imagesand, in particular, to syntactic encoding of images for latercompression by standard compression techniques.

BACKGROUND OF THE INVENTION

There are many types of video signals, such as digital broadcasttelevision (TV), video conferencing, interactive TV, etc. All of thesesignals, in their digital form, are divided into frames, each of whichconsists of many pixels (image elements), each of which requires 8-24bits to describe them. The result is megabits of data per frame.

Before storing and/or transmitting these signals, they typically arecompressed, using one of many standard video compression techniques,such as JPEG, MPEG, H-compression, etc. These compression standards usevideo signal transforms and intra- and inter-frame coding which exploitspatial and temporal correlations among pixels of a frame and acrossframes.

However, these compression techniques create a number of well-known,undesirable and unacceptable artifacts, such as blockiness, lowresolution and wiggles, among others. These are particularly problematicfor broadcast TV (satellite TV, cable TV, etc.) or for systems with verylow bit rates (video conferencing, videophone).

Much research has been performed to try and improve the standardcompression techniques. The following patents and articles discussvarious prior art methods to do so:

U.S. Pat. Nos. 5,870,501, 5,847,766, 5,845,012, 5,796,864, 5,774,593,5,586,200, 5,491,519, 5,341,442;

Raj Talluri et al, “A Robust, Scalable, Object-Based Video CompressionTechnique for Very Low Bit-Rate Coding,” IEEE Transactions of Circuitand Systems for Video Technology, vol. 7, No. 1, February 1997;

AwadKh. Al-Asmari, “An Adaptive Hybrid Coding Scheme for HDTV andDigital Sequences,” IEEE Transactions on Consumer Electronics, vol. 42,No. 3, pp. 926-936, August 1995;

Kwok-tung Lo and Jian Feng, “Predictive Mean Search Algorithms for FastVQ Encoding of Images,” IEEE Transactions On Consumer Electronics, vol.41, No. 2, pp. 327-331, May 1995;

James Goel et al. “Pre-processing for MPEG Compression Using AdaptiveSpatial Filtering”, IEEE Transactions On Consumer Electronics, vol. 41,No. 3, pp. 687-698, August 1995;

Jian Feng et al. “Motion Adaptive Classified Vector Quantization for ATMVideo Coding”, IEEE Transactions on Consumer Electronics, vol. 41, No.2, p. 322-326, May 1995;

Austin Y. Lan et al., “Scene-Context Dependent Reference—Frame Placementfor MPEG Video Coding,” IEEE Transactions on Circuits and Systems forVideo Technology, vol. 9, No.3, pp. 478-489, April 1999;

Kuo-Chin Fan, Kou-Sou Kan, “An Active Scene Analysis-Based approach forPseudoconstant Bit-Rate Video Coding”, IEEE Transactions on Circuits andSystems for Video Technology, vol. 8 No.2, pp. 159-170, April 1998;

Takashi Ida and Yoko Sambansugi, “Image Segmentation and ContourDetection Using Fractal Coding”, IEEE Transactions on Circuits andSystems for Video Technology, vol. 8, No. 8, pp. 968-975, December 1998;

Liang Shen and Rangaraj M. Rangayyan, “A Segmentation-Based LosslessImage Coding Method for High-Resolution Medical Image Compression,” IEEETransactions on Medical Imaging, vol. 16, No. 3, pp. 301-316, June 1997;

Adrian Munteanu et al., “Wavelet-Based Lossless Compression of CoronaryAngiographic Images”, IEEE Transactions on Medical Imaging, vol. 18, No.3, p. 272-281, March 1999; and

Akira Okumura et al., “Signal Analysis and Compression PerformanceEvaluation of Pathological Microscopic Images,” IEEE Transactions onMedical Imaging, vol. 16, No. 6, pp. 701-710, December 1997.

SUMMARY OF THE INVENTION

An object of the present invention is to provide a method and apparatusfor video compression which is generally lossless vis-à-vis what thehuman eye perceives.

There is therefore provided, in accordance with a preferred embodimentof the present invention, a visual perception threshold unit for imageprocessing. The threshold unit identifies a plurality of visualperception threshold levels to be associated with the pixels of a videoframe, wherein the threshold levels define contrast levels above which ahuman eye can distinguish a pixel from among its neighboring pixels ofthe video frame.

There is also provided, in accordance with a preferred embodiment of thepresent invention, the visual perception threshold unit which includes aparameter generator and a threshold generator. The parameter generatorgenerates a multiplicity of parameters that describe at least some ofthe information content of the processed frame. From the parameters, thethreshold generator generates a plurality of visual perception thresholdlevels to be associated with the pixels of the video frame. Thethreshold levels define contrast levels above which a human eye candistinguish a pixel from among its neighboring pixels of the frame.

Moreover, in accordance with a preferred embodiment of the presentinvention, the parameter generator includes a volume unit, a color unit,an intensity unit or some combination of the three. The volume unitdetermines the volume of information in the frame, the color unitdetermines the per pixel color and the intensity unit determines across-frame change of intensity.

There is also provided, in accordance with a preferred embodiment of thepresent invention, a method for generating visual perception thresholds.The method includes analysis of the details of the frames of a videosignal, estimating the parameters of the details, and defining a visualperception threshold for each detail in accordance with the estimateddetail parameters.

There is also provided, in accordance with a preferred embodiment of thepresent invention, a method for describing images. The method includesdetermining which details in the image can be distinguished by the humaneye and which ones can only be detected by it.

Moreover, in accordance with a preferred embodiment of the presentinvention, the method also includes providing one bit to describe apixel which can only be detected by the human eye, and providing threebits to describe a pixel which can be distinguished by the human eye.

Further, in accordance with a preferred embodiment of the presentinvention, the method also includes smoothing the data ofless-distinguished details.

Finally, in accordance with a preferred embodiment of the presentinvention, the step of determining details also includes identifyingareas of high contrast and areas whose details have small dimensions.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be understood and appreciated more fully fromthe following detailed description taken in conjunction with theappended drawings in which:

FIG. 1 is an example of a video frame;

FIG. 2 is a block diagram illustration of a video compression systemhaving a visual lossless syntactic encoder, constructed and operative inaccordance with a preferred embodiment of the present invention;

FIG. 3 is a block diagram illustration of the details of the visuallossless syntactic encoder of FIG. 2;

FIG. 4 is a graphical illustration of the transfer functions for anumber of high pass filters useful in the syntactic encoder of FIG. 3;

FIGS. 5A and 5B are block diagram illustrations of alternativeembodiments of a controllable filter bank forming part of the syntacticencoder of FIG. 3;

FIG. 6 is a graphical illustration of the transfer functions for anumber of low pass filters useful in the controllable filter bank ofFIGS. 5A and 5B;

FIG. 7 is a graphical illustration of the transfer function for anon-linear filter useful in the controllable filter bank of FIGS. 5A and5B;

FIGS. 8A, 8B and 8C are block diagram illustrations of alternativeembodiments of an inter-frame processor forming a controlled filterportion of the syntactic encoder of FIG. 3;

FIG. 9 is a block diagram illustration of a spatial-temporal analyzerforming part of the syntactic encoder of FIG. 3;

FIGS. 10A and 10B are detail illustrations of the analyzer of FIG. 9;and

FIG. 11 is a detail illustration of a frame analyzer forming part of thesyntactic encoder of FIG. 3.

DETAILED DESCRIPTION OF THE PRESENT INVENTION

Applicants have realized that there are different levels of image detailin an image and that the human eye perceives these details in differentways. In particular, Applicants have realized the following:

-   -   1. Picture details whose detection mainly depends on the level        of noise in the image occupy approximately 50-80% of an image.    -   2. A visual perception detection threshold for image details        does not depend on the shape of the details in the image.    -   3. A visual perception threshold THD depends on a number of        picture parameters, including the general brightness of the        image. It does not depend on the noise spectrum.

The present invention is a method for describing, and then encoding,images based on which details in the image can be distinguished by thehuman eye and which ones can only be detected by it.

Reference is now made to FIG. 1, which is a grey-scale image of aplurality of shapes of a bird in flight, ranging from a photograph ofone (labeled 10) to a very stylized version of one (labeled 12). Thebackground of the image is very dark at the top of the image and verylight at the bottom of the image.

The human eye can distinguish most of the birds of the image. However,there is at least one bird, labeled 14, which the eye can detect butcannot determine all of its relative contrast details. Furthermore,there are large swaths of the image (in the background) which have nodetails in them.

The present invention is a method and system for syntactic encoding ofvideo frames before they are sent to a standard video compression unit.The present invention separates the details of a frame into twodifferent types, those that can only be detected (for which only one bitwill suffice to describe each of their pixels) and those which can bedistinguished (for which at least three bits are needed to describe theintensity of each of their pixels).

Reference is now made to FIG. 2, which illustrates the present inventionwithin an image transmission system. Thus, FIG. 2 shows a visuallossless syntactic (VLS) encoder 20 connected to a standard videotransmitter 22 which includes a video compression encoder 24, such as astandard MPEG encoder, and a modulator 26. VLS encoder 20 transforms anincoming video signal such that video compression encoder 24 cancompress the video signal two to five times more than video compressionencoder 24 can do on its own, resulting in a significantly reducedvolume bit stream to be transmitted.

Modulator 26 modulates the reduced volume bit stream and transmits it toa receiver 30, which, as in the prior art, includes a demodulator 32 anda decoder 34. Demodulator 32 demodulates the transmitted signal anddecoder 34 decodes and decompresses the demodulated signal. The resultis provided to a monitor 36 for display.

It will be appreciated that, although the compression ratios are high inthe present invention, the resultant video displayed on monitor 36 isnot visually degraded. This is because encoder 20 attempts to quantifyeach frame of the video signal according to which sections of the frameare more or less distinguished by the human eye. For theless-distinguished sections, encoder 20 either provides pixels of aminimum bit volume, thus reducing the overall bit volume of the frame orsmoothes the data of the sections such that video compression encoder 24will later significantly compress these sections, thus resulting in asmaller bit volume in the compressed frame. Since the human eye does notdistinguish these sections, the reproduced frame is not perceivedsignificantly differently than the original frame, despite its smallerbit volume.

Reference is now made to FIG. 3, which details the elements of VLSencoder 20. Encoder 20 comprises an input frame memory 40, a frameanalyzer 42, an intra-frame processor 44, an output frame memory 46 andan inter-frame processor 48. Analyzer 42 analyzes each frame to separateit into subclasses, where subclasses define areas whose pixels cannot bedistinguished from each other. Intra-frame processor 44 spatiallyfilters each pixel of the frame according to its subclass and,optionally, also provides each pixel of the frame with the appropriatenumber of bits. Inter-frame processor 48 provides temporal filtering(i.e. inter-frame filtering) and updates output frame memory 46 with theelements of the current frame which are different than those of theprevious frame.

It is noted that frames are composed of pixels, each having luminance Yand two chrominance C_(r) and C_(b) components, each of which istypically defined by eight bits. VLS encoder 20 generally separatelyprocesses the three components. However, the bandwidth of thechrominance signals is half as wide as that of the luminance signal.Thus, the filters (in the x direction of the frame) for chrominance havea narrower bandwidth. The following discussion shows the filters for theluminance signal Y.

Frame analyzer 42 comprises a spatial-temporal analyzer 50, a parameterestimator 52, a visual perception threshold determiner 54 and a subclassdeterminer 56. Details of these elements are provided in FIGS. 9-11,discussed hereinbelow.

As discussed hereinabove, details which the human eye distinguishes areones of high contrast and ones whose details have small dimensions.Areas of high contrast are areas with a lot of high frequency content.Thus, spatial-temporal analyzer 50 generates a plurality of filteredframes from the current frame, each filtered through a different highpass filter (HPF), where each high pass filter retains a different rangeof frequencies therein.

FIG. 4, to which reference is now briefly made, is an amplitude vs.frequency graph illustrating the transfer functions of an exemplary setof high pass filters for frames in a non-interlacing scan format. Fourgraphs are shown. It can be seen that the curve labeled HPF-R3 has acutoff frequency of 1 MHz and thus, retains portions of the frame withinformation above 1 MHz. Similarly, curve HPF-R2 has a cutoff frequencyof 2 MHz, HPF-C2 has a cutoff frequency of 3 MHz and HPF-R1 and HPF-C1have a cutoff frequency of 4 MHz. As will be discussed hereinbelow, theterminology “Rx” refers to operations on a row of pixels while theterminology “Cx” refers to operations on a column of pixels.

In particular, the filters of FIG. 4 implement the following finiteimpulse response (FIR) filters on either a row of pixels (the xdirection of the frame) or a column of pixels (the y direction of theframe), where the number of pixels used in the filter defines the powerof the cosine. For example, a filter implementing cos¹⁰ x takes 10pixels around the pixel of interest, five to one side and five to theother side of the pixel of interest.

-   -   HPF-R3: 1−cos¹⁰ x    -   HPF-R2: 1−cos⁶ x    -   HPF-R1: 1−cos² x    -   HPF-C2: 1-cos⁴ y    -   HPF-C1: 1−cos² y

The high pass filters can also be considered as digital equivalents ofoptical apertures. The higher the cut-off frequency, the smaller theaperture. Thus, filters HPF-R1 and HPF-C1 retain only very small detailsin the frame (of 1-4 pixels in size) while filter HPF-R3 retains muchlarger details (of up to 11 pixels).

In the following, the filtered frames will be labeled by the type offilter (HPF-X) used to create them.

Returning to FIG. 3, analyzer 50 also generates difference framesbetween the current frame and another, earlier frame. The previous frameis typically at most 15 frames earlier. A “group” of pictures or frames(GOP) is a series of frames for which difference frames are generated.

Parameter estimator 52 takes the current frame and the filtered anddifference frames and generates a set of parameters that describe theinformation content of the current frame. The parameters are determinedon a pixel-by-pixel basis or on a per frame basis, as relevant. It isnoted that the parameters do not have to be calculated to great accuracyas they are used in combination to determine a per pixel, visualperception threshold THD_(i).

At least some of the following parameters are determined:

Signal to noise ratio (SNR): this parameter can be determined bygenerating a difference frame between the current frame and the framebefore it, high pass filtering of the difference frame, summing theintensities of the pixels in the filtered frame, normalized by both thenumber of pixels N in a frame and the maximum intensity I_(MAX) possiblefor the pixel. If the frame is a television frame, the maximum intensityis 255 quanta (8 bits). The highs frequency filter selects only thoseintensities lower than 3σ, where σ indicates a level less than which thehuman eye cannot perceive noise. For example, σ can be 46 dB, equivalentto a reduction in signal strength of a factor of 200.

Normalized NΔ_(i): this measures the change Δ_(i), per pixel i, from thecurrent frame to its previous frame. This value is then normalized bythe maximum intensity I_(MAX) possible for the pixel.

Normalized volume of intraframe change NI_(XY): this measures the volumeof change in a frame I_(XY) (or how much detail there is in a frame),normalized by the maximum possible amount of information MAX_(INFO)within a frame (i.e. 8 bits per pixel x N pixels per frame). Since thehighest frequency range indicates the amount of change in a frame, thevolume of change I_(XY) is a sum of the intensities in the filteredframe having the highest frequency range, such as filtered frame HPF-R1.

Normalized volume of interframe changes NI_(F): this measures the volumeof changes I_(F) between the current frame and its previous frame,normalized by the maximum possible amount of information MAX_(INFO)within a frame. The volume of interframe changes I_(F) is the sum of theintensities in the difference frame.

Normalized volume of change within a group of frames NI_(GOP): thismeasures the volume of changes I_(GOP) over a group of frames, where thegroup is from 2 to 15 frames, as selected by the user. It is normalizedby the maximum possible amount of information MAX_(INFO) within a frameand by the number of frames in the group.

Normalized luminance level NY_(i): Y_(i) is the luminance level of apixel in the current frame. It is normalized by the maximum intensityI_(MAX) possible for the pixel.

Color saturation p_(I): this is the color saturation level of the ithpixel and it is determined by:$\left\lbrack {{0.78\left( \frac{C_{r,i} - 128}{160} \right)^{2}} + {0.24\left( \frac{C_{b,i} - 128}{126} \right)^{2}}} \right\rbrack^{1/2}$

where C_(r,i) and C_(b,i) are the chrominance levels of the ith pixel.

Hue h_(i): this is the general hue of the ith pixel and is determinedby: ${\arctan\left( {1.4\frac{C_{r,i} - 128}{C_{b,i} - 128}} \right)}.$

Alternatively, hue h_(i) can be determined by interpolating Table 1,below.

Response to hue R_(i)(h_(i)): this is the human vision response to agiven hue and is given by Table 1, below. Interpolation is typicallyused to produce a specific value of the response R(h) for a specificvalue of hue h.

TABLE 1 Color Y C_(r) C_(b) h (nm) R(h) White 235 128 128 — — Yellow 21016 146 575 0.92 Cyan 170 166 16 490 0.21 Green 145 54 34 510 0.59Magenta 106 202 222 — 0.2 Red 81 90 240 630 0.3 Blue 41 240 110 475 0.11Black 16 128 128 — —

Visual perception threshold determiner 54 determines the visualperception threshold THD_(I) per pixel as follows:${THD}_{i} = \quad{{THD}_{m{in}}\left( \quad{1 + {N\Delta}_{i} + {NI}_{XY} + {NI}_{F} + {NI}_{GOP} + {NY}_{i} + p_{i} + \left( {1 - {R_{i}\left( h_{i} \right)}} \right) + \frac{200}{SNR}} \right)}$

Subclass determiner 56 compares each pixel i of each high pass filteredframe HPF-X to its associated threshold THD_(i) to determine whether ornot that pixel is significantly present in each filtered frame, where“significantly present” is defined by the threshold level and by the“detail dimension” (i.e. the size of the object or detail in the imageof which the pixel forms a part). Subclass determiner 56 then definesthe subclass to which the pixel belongs.

For the example provided above, if the pixel is not present in any ofthe filtered frames, the pixel must belong to an object of large size orthe detail is only detected but not distinguished. If the pixel is onlyfound in the filtered frame of HPF-C2 or in both frames HPF-C1 andHPF-C2, it must be a horizontal edge (an edge in the Y direction of theframe). If it is found in filtered frames HPF-R3 and HPF-C2, it is asingle small detail. If the pixel is found only in filtered framesHPF-R1, HPF-R2 and HPF-R3, it is a very small vertical edge. If, inaddition, it is also found in filtered frame HPF-C2, then the pixel is avery small, single detail.

The above logic is summarized and expanded in Table 2.

TABLE 2 High Pass Filters Subclass R1 R2 R3 C1 C2 Remarks 1 0 0 0 0 0Large detail or detected detail only 2 0 0 0 0 1 Horizontal edge 3 0 0 01 1 Horizontal edge 4 0 0 1 0 0 Vertical edge 5 0 0 1 0 1 Single smalldetail 6 0 0 1 1 1 Single small detail 7 0 1 1 0 0 Vertical edge 8 0 1 10 1 Single small detail 9 0 1 1 1 1 Single small detail 10 1 1 1 0 0Very small vertical edge 11 1 1 1 0 1 Very small single detail 12 1 1 11 1 Very small single detail

The output of subclass determiner 56 is an indication of the subclass towhich each pixel of the current frame belongs. Intra-frame processor 44performs spatial filtering of the frame, where the type of filterutilized varies in accordance with the subclass to which the pixelbelongs.

In accordance with a preferred embodiment of the present invention,intra-frame processor 44 filters each subclass of the frame differentlyand according to the information content of the subclass. The filteringlimits the bandwidth of each subclass which is equivalent to samplingthe data at different frequencies. Subclasses with a lot of content aresampled at a high frequency while subclasses with little content, suchas a plain background area, are sampled at a low frequency.

Another way to consider the operation of the filters is that they smooththe data of the subclass, removing “noisiness” in the picture that thehuman eye does not perceive. Thus, intra-frame processor 44 changes theintensity of the pixel by an amount less than the visual distinguishingthreshold for that pixel. Pixels whose contrast is lower than thethreshold (i.e. details which were detected only) are transformed withnon-linear filters. If desired, the data size of the detected onlypixels can be reduced from 8 bits to 1 or 2 bits, depending on thevisual threshold level and the detail dimension for the pixel. For theother pixels (i.e. the distinguished ones), 3 or 4 bits is sufficient.

Intra-frame processor 44 comprises a controllable filter bank 60 and afilter selector 62. Controllable filter bank 60 comprises a set of lowpass and non-linear filters, shown in FIGS. 5A and 5B to which referenceis now made, which filter selector 62 activates, based on the subclassto which the pixel belongs. Selector 62 can activate more than onefilter, as necessary.

FIGS. 5A and 5B are two, alternative embodiments of controllable filterbank 60. Both comprise two sections 64 and 66 which operate on columns(i.e. line to line) and on rows (i.e. within a line), respectively. Ineach section 64 and 66, there is a choice of filters, each controlled byan appropriate switch, labeled SW-X, where X is one of C1, C2, R1, R2,R3 (selecting one of the low pass filters (LPF)), D-C, D-R (selecting topass the relevant pixel directly). Filter selector 62 switches therelevant switch, thereby activating the relevant filter. It is notedthat the non-linear filters NLF-R and NLF-C are activated by switches R3and C2, respectively. Thus, the outputs of non-linear filters NLF-R andNLF-C are added to the outputs of low pass filters LPF-R3 and LPF-C2,respectively.

Controllable filter bank 60 also includes time aligners (TA) which addany necessary delays to ensure that the pixel currently being processedremains at its appropriate location within the frame.

The low pass filters (LPF) are associated with the high pass filtersused in analyzer 50. Thus, the cutoff frequencies of the low passfilters are close to those of the high pass filters. The low passfilters thus pass that which their associated high pass filters ignore.

FIG. 6, to which reference is now briefly made, illustrates exemplarylow pass filters for the example provided hereinabove. Low pass filterLPF-R3 has a cutoff frequency of 0.5 MHz and thus, generally does notretain anything which its associated high pass filter HPF-R3 (with acutoff frequency of 1 MHz) retains. Filter LPF-R2 has a cutoff frequencyof 1 MHz, filter LPF-C2 has a cutoff frequency of 1.25 MHz and filtersLPF-C1 and LPF-R1 have a cutoff frequency of about 2 MHz. As for thehigh frequency filters, filters LPF-Cx operate on the columns of theframe and filters LPF-Rx operate on the rows of the frame.

FIG. 7, to which reference is now briefly made, illustrates an exemplarytransfer function for the non-linear filters (NLF) which models theresponse of the eye when detecting a detail. The transfer functiondefines an output value Vout normalized by the threshold level THD_(i)as a function of an input value Vin also normalized by the thresholdlevel THD_(i). As can be seen in the figure, the input-outputrelationship is described by a polynomial of high order. A typical ordermight be six, though lower orders, of power two or three, are alsofeasible.

Table 3 lists the type of filters activated per subclass, where theheader for the column indicates both the type of filter and the label ofthe switch SW-X of FIGS. 5A and 5B.

TABLE 3 Low Pass Filters Subclass R1 R2 R3 C1 C2 D-R D-C 1 0 0 1 0 1 0 02 0 0 1 1 0 0 0 3 0 0 1 0 0 0 1 4 0 1 0 0 1 0 0 5 0 1 0 1 0 0 0 6 0 1 00 0 0 1 7 1 0 0 0 1 0 0 8 1 0 0 1 0 0 0 9 1 0 0 0 0 0 1 10 0 0 0 0 1 1 011 0 0 0 1 0 1 0 12 0 0 0 0 0 1 1

FIG. 5B includes rounding elements RND which reduce the number of bitsof a pixel from eight to three or four bits, depending on the subclassto which the pixel belongs. Table 4 illustrates the logic for theexample presented hereinabove, where the items which are not active forthe subclass are indicated by “N/A”.

TABLE 4 RND-R0 RND-R1 RND-R2 RND-C0 RND-C1 subclass (Z1) (Z2) (Z3) (Z4)(Z5) 1 N/A N/A N/A N/A N/A 2 N/A N/A N/A N/A 4 bit 3 N/A N/A N/A 4 bitN/A 4 N/A N/A 4 bit N/A N/A 5 N/A N/A 4 bit N/A 4 bit 6 N/A N/A 4 bit 4bit N/A 7 N/A 4 bit N/A N/A N/A 8 N/A 3 bit N/A N/A 3 bit 9 N/A 3 bitN/A 3 bit N/A 10 4 bit N/A N/A N/A N/A 11 3 bit N/A N/A N/A 3 bit 12 3bit N/A N/A 3 bit N/A

The output of intra-frame processor 44 is a processed version of thecurrent frame which uses fewer bits to describe the frame than theoriginal version.

Reference is now made to FIGS. 8A, 8B and 8C, which illustrate threealternative embodiments for inter-frame processor 48 which providestemporal filtering (i.e. inter-frame filtering) to further process thecurrent frame. Since the present invention provides a full frame asoutput, inter-frame processor 48 determines which pixels have changedsignificantly from the previous frame and amends those only, storing thenew version in the appropriate location in output frame memory 46.

The embodiments of FIGS. 8A and 8B are open loop versions (i.e. theprevious frame is the frame previously input into inter-frame processor48) while the embodiment of FIG. 8C is a closed loop version (i.e. theprevious frame is the frame previously produced by inter-frame processor48). All of the embodiments comprise a summer 68, a low pass filter(LPF) 70, a high pass filter (HPF) 72, two comparators 74 and 76, twoswitches 78 and 80, controlled by the results of comparators 74 and 76,respectively, and a summer 82. FIGS. 8A and 8B additionally include anintermediate memory 84 for storing the output of intra-frame processor44.

Summer 68 takes the difference of the processed current frame, producedby processor 44, and the previous frame, stored in either intermediatememory 84 (FIGS. 8A and 8B) or in frame memory 46 (FIG. 8C). Thedifference frame is then processed in two parallel tracks.

In the first track, the low pass filter is used. Each pixel of thefiltered frame is compared to a general, large detail, threshold THD-LFwhich is typically set to 5% of the maximum expected intensity for theframe. Thus, the pixels which are kept are only those which changed bymore than 5% (i.e. those whose changes can be “seen” by the human eye).

) In the second track, the difference frame is high pass filtered. Sincehigh pass filtering retains the small details, each pixel of the highpass filtered frame is compared to the particular threshold THD, forthat pixel, as produced by threshold determiner 54. If the differencepixel has an intensity above the threshold THD_(i) (i.e. the change inthe pixel is significant for detailed visual perception), it is allowedthrough (i.e. switch 80 is set to pass the pixel).

Summer 82 adds the filtered difference pixels passed by switches 78and/or 80 with the pixel of the previous frame to “produce the newpixel”. If switches 78 and 80 did not pass anything, the new pixel isthe same as the previous pixel. Otherwise, the new pixel is the sum ofthe previous pixel and the low and high frequency components of thedifference pixel.

Reference is now briefly made to FIGS. 9, 10A, 10B and 11 which detailelements of frame analyzer 42. In these figures, the term “ML” indicatesa memory line of the current frame, “MP” indicates a memory pixel of thecurrent frame, “MF” indicates a memory frame, “VD” indicates thevertical drive signal, “TA” indicates a time alignment, e.g. a delay,and CNT indicates a counter.

FIG. 9 generally illustrates the operation of spatial-temporal analyzer50 and FIGS. 10A and 10B provide one detailed embodiment for the spatialanalysis and temporal analysis portions 51 and 53, respectively. FIG. 11details parameter estimator 52, threshold determiner 54 and subclassdeterminer 56. As these figures are deemed to be self-explanatory, nofurther explanation will be included here.

It is noted that the present invention can be implemented with a fieldprogrammable gate array (FPGA) and the frame memory can be implementedwith SRAM or SDRAM.

The methods and apparatus disclosed herein have been described withoutreference to specific hardware or software. Rather, the methods andapparatus have been described in a manner sufficient to enable personsof ordinary skill in the art to readily adapt commercially availablehardware and software as may be needed to reduce any of the embodimentsof the present invention to practice without undue experimentation andusing conventional techniques.

It will be appreciated by persons skilled in the art that the presentinvention is not limited by what has been particularly shown anddescribed herein above. Rather the scope of the invention is defined bythe claims that follow:

1. A visual perception threshold unit for image processing, thethreshold unit comprising: a parameter generator to generate amultiplicity of parameters that describe at least some of theinformation content of at least one video frame to be processed; and athreshold generator to generate from said parameters, a plurality ofvisual perception threshold levels to be associated with the pixels ofthe at least one video frame, wherein said threshold levels definecontrast levels above which a human eye can distinguish a pixel fromamong its neighboring pixels of said at least one video frame.
 2. A unitaccording to claim 1 and , wherein said parameter generator comprises atleast one of the following units: a volume unit which determines the avolume of information in said at least one video frame; a color unitwhich determines a per pixel color; and an intensity unit whichdetermines a cross-frame change of intensity.
 3. A method of generatingvisual perception thresholds for image processing implemented by one ormore elements of a video encoding device, the method comprising:analyzing details of frames of a video signal; estimating parameters ofsaid details; and defining a visual perception threshold for each ofsaid details in accordance with said estimated detail parameters,wherein said estimating comprises at least one of the following:determining a per-pixel signal intensity change between a current frameand a previous frame, normalized by a maximum intensity; determining anormalized volume of intraframe change by high frequency filtering ofsaid current frame, summing the intensities of said filtered frame andnormalizing the resultant sum by the a maximum possible amount ofinformation within a frame; generating a volume of inter-frame changesbetween a said current frame and its said previous frame normalized bysaid maximum possible amount of information volume within a frame;generating a normalized volume of inter-frame changes for a group ofpictures frames from the output of said previous step of generating;evaluating a signal-to-noise ratio by high pass filtering a differenceframe between said current frame and its said previous frame byselecting those intensities of said difference frame lower than athreshold defined as three times a noise level under which noiseintensities are not perceptible to the human eye, summing theintensities of the pixels in the filtered difference frame andnormalizing said sum by said maximum intensity and by the a total numberof pixels in a frame; generating a normalized intensity value per-pixel;generating a per-pixel color saturation level; generating a per-pixelhue value; and determining a per-pixel response to said hue value.
 4. Amethod for describing an image implemeneted by one or more elements of avideo encoding device, the method comprising determining which detailsin said image can be distinguished by the human eye and which ones canonly be detected by it; providing one bit to describe a pixel which canonly be detected by the human eye; and providing three bits to describea pixel which can be distinguished by the human eye.
 5. A videocompression system comprising: a parameter generator to generate one ormore parameters that describe information content of a video frame; anda threshold generator to generate, from at least one of the parameters,a plurality of visual perception threshold levels to be associated withpixels of the video frame, wherein said threshold levels define contrastlevels above which a pixel of the video frame can be visuallydistinguished from its neighboring pixels of the video frame.
 6. A videocompression system according to claim 5, wherein the parameter generatorcomprises a volume unit configured to determine a volume of informationin the video frame.
 7. A video compression system according to claim 5,wherein the parameter generator comprises a color unit configured todetermine a per pixel color.
 8. A video compression system according toclaim 5, wherein the parameter generator comprises an intensity unitconfigured to determine a cross-frame change of intensity.
 9. A videocompression system according to claim 5, wherein the parameter generatorand the threshold generator are implemented in a field programmable gatearray (FPGA).
 10. A video encoder comprising: a parameter generator togenerate multiple parameters that describe information content of avideo frame; and a threshold generator to generate, from at least one ofthe multiple parameters, a plurality of visual perception thresholdlevels to be associated with pixels of the video frame, wherein saidthreshold levels define contrast levels above which a pixel of the videoframe can be visually distinguished from its neighboring pixels of thevideo frame.
 11. A video encoder according to claim 10, wherein theparameter generator comprises a volume unit configured to determine avolume of information in the video frame.
 12. A video encoder accordingto claim 10, wherein the parameter generator comprises a color unitconfigured to determine a per pixel color.
 13. A video encoder accordingto claim 10, wherein the parameter generator comprises an intensity unitconfigured to determine a cross-frame change of intensity.
 14. A videoencoder according to claim 10, wherein the video encoder is embodied ina field programmable gate array (FPGA).
 15. A video encoder according toclaim 10, wherein the video encoder comprises a visual losslesssyntactic encoder.
 16. A system comprising: means for generating one ormore parameters that describe information content of a video frame; andmeans for generating, from at least one of the parameters, a pluralityof visual perception threshold levels to be associated with pixels ofthe video frame, wherein said threshold levels define contrast levelsabove which a pixel of the video frame can be visually distinguishedfrom its neighboring pixels of the video frame.
 17. A system accordingto claim 16, wherein the one or more parameters are associated with atleast one of: a volume of information in the video frame; a cross-framechange of intensity; or a per pixel color.
 18. A system according toclaim 16, wherein the system is embodied in a field programmable gatearray (FPGA).
 19. A video compression system comprising: means foranalyzing one or more details associated with one or more frames of avideo signal; means for estimating parameters of individual analyzeddetails; and means for defining a visual perception threshold forindividual analyzed details in accordance with at least one of theestimated parameters, wherein said means for estimating comprises atleast one of: means for determining a per-pixel signal intensity changebetween a current frame and a previous frame, normalized by a maximumintensity; means for determining a normalized volume of intraframechange by high frequency filtering of said current frame, summing theintensities of said filtered frame and normalizing the resultant sum bya maximum possible amount of information within a frame; means forgenerating a volume of inter-frame changes between said current frameand said previous frame normalized by said maximum possible amount ofinformation within a frame; means for generating a normalized volume ofinter-frame changes within a group of frames normalized by said maximumpossible amount of information within a frame and by a number of framescomprising said group of frames; means for evaluating a signal-to-noiseratio by high pass filtering a difference frame between said currentframe and said previous frame by selecting intensities of saiddifference frame lower than a threshold defined as three times a noiselevel under which noise intensities are not visually perceptible,summing the intensities of pixels in the filtered difference frame andnormalizing said sum by said maximum intensity and by the total numberof pixels in a frame; means for generating a normalized intensity valueper-pixel; means for generating a per-pixel color saturation level;means for generating a per-pixel hue value; or means for determining aper-pixel response to said hue value.
 20. A video compression systemaccording to claim 19, embodied in a field programmable gate array(FPGA).
 21. A method implemented by one or more elements of a videoencoding device comprising: identifying one or more distinguishabledetails in an image, individual distinguishable details being associatedwith a contrast level at which a pixel can be visually distinguishedfrom among its neighboring pixels; using a plurality of bits to describeindividual identified distinguishable details; and using less than saidplurality of bits to describe one or more individual details in theimage not identified as distinguishable.
 22. A method according to claim21, further comprising identifying the one or more of the individualdetails in the image not identified as distinguishable as being visuallydetectable.
 23. A method according to claim 21, wherein using theplurality of bits comprises using three bits.
 24. A method according toclaim 21, further comprising performing the identifying, the using theplurality of bits, and the using less than said plurality of bits by theone or more elements embodied in a field programmable gate array (FPGA).25. A system comprising: means for identifying one or moredistinguishable details in an image, individual distinguishable detailsbeing associated with a contrast level at which a pixel can be visuallydistinguished from among its neighboring pixels; means for using aplurality of bits to describe individual identified distinguishabledetails; and means for using less than said plurality of bits todescribe one or more individual details in the image not identified asdistinguishable.
 26. A system according to claim 25, wherein one or moreof the individual details in the image not identified as distinguishableare identified as being visually detectable.
 27. A system according toclaim 25, wherein the plurality of bits comprises three bits.
 28. Asystem according to claim 25 comprising part of a field programmablegate array (FPGA).
 29. A visual perception threshold unit according toclaim 1, wherein one or both of the parameter generator or the thresholdgenerator are implemented in a video encoder.
 30. A visual perceptionthreshold unit according to claim 29, wherein the video encodercomprises a visual lossless syntactic encoder.
 31. A visual perceptionthreshold unit according to claim 1 comprising part of a fieldprogrammable gate array (FPGA).